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The study investigated the effects of several variables on the 
expression of confidence in the accuracy of responses to objec- 
tive test items. A final examination was administered to 72 
subjects under confidence-weighting instructions (Ebel, 1965) 
with two levels of penalty for incorrect responses. A two-way 
ANOVA revealed no significant main effects or interaction at- 
tributable to level of penalty or sex. A multiple correlation 
of .39 was obtained between an ascendance score, based on a 
composite of scales from the California Psychological Inven- 
tory, and a score based on the number of incorrect responses 
for which maximum confidence was expressed. An ANOVA on a re- 
gression analysis resulted in a significant F (p < .05). Al- 
though increased penal ty- level had no effect on confidence-ex- 
pression, the test's reliability decreased from .85 to .39, 
and the correlation between conventional and weighted scores 
dropped from .88 to .095. 



One of the current problems in educational and psychological 
measurement is the assessment of partial knowledge or degree of mastery 
of material tapped by objective test items. 

A strategy which has been advanced as a technique for extrac- 
ting additional information from objecti ve. fast item responses, and as 
a means of increasing test reliability, is known as confidence-weighting 
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and Is usually attributed to Ebel (1965a> 1965b). Confidence- 
weighting 1$ described as 

"...a special mode of responding to objective test Items, 
and a special mode of scoring those responses. In general 
terms, the examinee Is asked to Indicate not only what he 
believes to be the correct answer to a question, but also 
how certain he Is of the correctness of his answer. When 
his answers are scored he receives more credit for a cor- 
rect answer given confidently than for one given diffi- 
dently. But the penalty for an Incorrect answer given 
confidently Is heavy enough to discourage unwarranted pre- 
tense of confidence.'* (Ebel, 1965a, p. h$). 

Alternative scoring procedures, such as confidence-weighting, 
are often regarded as relatively recent measurement Innovations. This 
Is not completely accurate. For example, the technique of confidence- 
weighting has a long history of psycho-physical experimentation (e.g. 
Henmon, 1911; Holllngworth, 1913; Trow, 1923). The technique was 
thoroughly Investigated, both directly and indirectly, in a long line 
of studies In educational and psychological measurement which was in- 
terrupted In the early 19^0's, probably by the diversion of research 
talent Into activities related to World War It, (e.g. Greene, 1929; 
Jerslld, 1929; Hevner, 1932; Melbo, 1933; Wiley and Trimble, 1936; 
Soderquist, 1936; Swinelford, 1938; 19^1; Meyer, 1939; Johnson, I9A0, 
1941.) 



More recently, a number of studies have appeared which Imply 
that alternative scoring procedures such as confidence-weighting may 
serve to make measurement more precise, (e.g, Michael, 1968). There 
seems to be little research aimed at identifying the relevant factors 
operative in situations where the respondent is given some latitude. 
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Several studies Indicate reliable Individual differences in risk- 
taking may be operative (Slakter, 1968a; 1968b), and that some al- 
ternative-scoring procedures result In Information which differs 
from that obtained via conventional means (RIppey, 1968). 

There appear to be at least the following assumptions made 
In the use of confidence- weigh ting: 

1) Students will perform in a rational manner in a confi- 
dence-weighting situation, I.e. a high relationship 
exists between the possession of knowledge and a willing- 
ness to express this fact and 

2) the technique Is not contaminated through the Intro- 
duction of extraneous variables, which may bias the 
situation for or against certain students, regardless of 
achievement. 

A number of studies cited reveal these assumptions arc not 
tenable. One may conclude that any situation in which the subject Is 
permitted some latitude In responding, such as confidence-weighting, is 
moderated by factors usually extraneous to the test. This has, In fact, 
been amply demonstrated by Votaw (1936) and Sherri ffs and Boomer (195^) 
In investigations concerning the effects of the correction for guessing. 
A number of studies (e.g., Soderquist; Wiley and Trimble; Swineford; 
Slakter) have posited the operation of a personality variable; several 
ranking Indices have been suggested (Slakter, 1967), but there has been 
little research designed to Investigate the internal validity of confi- 
dence-we I gh 1 1 ng . 
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The present study was undertaken to determine the effects of 
two levels of penalty on the unwarranted expression of confidence, the 
personality correlates of confidence-expression and the effects on test 
statistics of confidence-weighting. 



Method 

Subjects 

The 72 subjects In the present study were 2k male and kB female 
undergraduates, predominantly freshmen and sophmores, enrolled in an 
introductory course in personality and adjustment. 

Procedure 

Ss were administered a 130 Item multiple-choice course final 
examination under confidence-weighting Instructions. Ss were given a 
general explanation of the technique; instructional paragraphs preceding 
the examination detailed the credit and penalty al lot/</ances, with 
examples. 

In addition to selecting what was perceived as the most correct 
answer for each test Item, Ss Indicated their degree of confidence In 
their response on a three-point scale (Guess, Fairly Confident, Very 
Confident), with graded credit and penalty. A randomly selected 36 Ss 
(Group A) could earn 1 point for a correct response marked ‘'Guess," 

2 points for "Fairly Confident" and 3 points for "Very Confident." if 
the selected option was incorrect, Ss lost 0, 2 or 3 points, depending 
on the category of confidence selected* The remaining 36 Ss (Group B) 
were Informed the penalties were 0, A or 6 points, again depending upon 
the confidence-category selected. 
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All Ss had completed the California Psychological Inventory 
(CPI) (G6ugh, 1957) as a part of freshman orientation. These data were 
obtained from University records. For each subject, standard scores on 
the Dominance (Do) scale, Sociability (Sy) scale, Self-acceptance ($a) 
scale and Intellectual Efficiency (le) scale were combined as a measure 
of ascendance. (Crites, 1961; 1964) 

The measure of confidence-expression used In the present study 
was defined as 

CONF ■ number of errors for which maximum confidence was expressed .. 

number of errors 

Results 

Table 1 displays the CONF scores obtained from male and 
female Ss under the two levels of penalty. The hypothesis of no treat- 
ment, no sex and no treatment x sex effects was tested using a two-way 
ANOVA and retained at the .05 level. (See Table 2) 

TABLE 1 

CONF scores for Groups A and B by Sex 



Group |by SexT I 


1 ^ 


Mean 


Std. Deviation 


A, males 


12 


16.15 


7.9 


A, females 


24 


10.13 


9.5 


B, males 


12 


15.43 


10.1 


B, females 


24 


17.82 


13.8 
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TABLE 2 

ANOVA Testing Effects of Magnitude of Penalty 
and Sex of Subject on CONE scores 



Source 


df 


MS 


F 


Treatment (A) 


1 


3.04 


0.03 


Sex (B) 


1 


193.67 


1.57 


A X B 


1 


282.80 


2.29 


Within 


68 


123.64 


— 



The relationship between the variable of ascendance, as de- 
fined in the present study, and the expression of unwarranted confidence 
was investigated using multiple correlation. (See Table 3) An F-test 
on the regression of ascendance on CONF scores resulted in an F signifi- 
cant at the ,05 level. (See Table k) 



TABLE 3 

Multiple Correlation Between "Ascendance" 
Composite Scores and CONF Scores 





Variable 


Multiple Correlation 


r F Value 


le 


.22 


3.62 


Do 


.35 1 


4.69 


Sa 


.38 


3.71 


Sy 

— 


.39 


2.93 




TABLE A 

ANOVA for Regression of "Ascendance* 
Composite Scores on CONF Scores 
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Source 


df 


MS 


F 


Due to regression 




331.07 


2 . 33 * 


Deviation about regression 


67 


112.87 





*p < .05 



The split-halves reliability of unweighted and confidence 
weighted achievement test scores and CONF scores of Groups A and B 
are summarized in Table 5. 



TABLE 5 

Split-half Reliability Estimates; Groups A and B 



Variable 


A 


B 


Unweighted achievement scores 


.89 


.79 


Confidence-weighted achievement scores 


.87 


.39 


CONF scores 


.86 


.68 



Discussion 

It Is apparent that, at least with naive Ss, moderate shifts 
in the level of penalty associated with Incorrect responses do not 
modify the expression of unwarranted confidence. An additional 
analysis of the frequency of usage of the three possible confidence 




cat69ori6S in Groups A and B revealed no significant difference. It 
should be pointed out that these were naive Ss, who lacked any "base- 
line,*' based on previous experience, for their expression of confi- 
dence. Graded levels of penalty may have some effect on students 
with prior experience with confidence-weighting. 

It appears that the expression of confidence in the accuracy 
of one's responses to objective test items is contaminated by a more 
general personality factor. The procedure seems biased against ascen- 
dant Ss, since a greater number of their incorrect responses are 

given with maximum confidence, which results in a greater incurred 
penalty. 

One of the often-cited reasons for employing confidence- 
weight! ng, • that of affording an increase in test reliability, finds 
no support in the present study. 

As may be seen in Table 5 , the data, with one exception, were 
quite reliable. Of interest is the difference in internal consistency 
of confidence-weighted achievement scores in Groups A and B. Apparently, 
the asymmetric credit-penalty relationship served to magnify incon- 
sistencies with which confidence-weighting was employed. To clarify 
this point, the correlation between conventional and confidence- 
weighted scores was obtained and found to be .88 for Group A and 
.095 for Group B. 

In summary, there seems to be little advantage in the use of 
confidence-weighting as a means of tapping partial knowledge, since 
it is contaminated or moderated by personality. Reasonable levels of 
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penalty do not effectively control the expression of unwarranted confi- 
cence, but increased penalties destroy the internal consistency of the 
data, and result in scores which bear no relationship to basic "number 
right" scores. The situation with reference to reliability gains 
noted with conf i dence-weighting is unclear; it may be hypothesized that 
this would occur maximally with tests of low reliability, where the in- 
troduction of an independent, reliable (but extraneous) source of 
variance ml ght result in marked shifts in estimated reliability. 

The time and effort required to both complete and score a con- 
fidence-weighted test are substantially greater than that required for 
a conventional test. The present study finds no advantage in terms 
of amount or precision of information. 

Perhaps the best way to deal with individual differences in 
risk-taking propensities is to require Ss to respond to all items, even 
though this may increase error variance somewhat, and thereby decrease 
reliability. With reference to increased precision, it appears that 
the conventional route of item analysis and revision Is less suspect. 
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